Safety-constrained reinforcement learning with a distributional safety critic
نویسندگان
چکیده
Abstract Safety is critical to broadening the real-world use of reinforcement learning. Modeling safety aspects using a safety-cost signal separate from reward and bounding expected becoming standard practice, since it avoids problem finding good balance between performance. However, can be risky set constraints only on expectation neglecting tail distribution, which might have prohibitively large values. In this paper, we propose method called Worst-Case Soft Actor Critic for safe RL that approximates distribution accumulated safety-costs achieve risk control. More specifically, certain level conditional Value-at-Risk regarded as constraint, guides change adaptive weights trade-off safety. As result, compute policies whose worst-case performance satisfies constraints. We investigate two ways estimate namely Gaussian approximation quantile regression algorithm. On one hand, simple easy implement, but may underestimate cost, other leads more conservative behavior. The empirical analysis shows achieves excellent results in complex safety-constrained environments, showing
منابع مشابه
Safety-Constrained Reinforcement Learning for MDPs
We consider controller synthesis for stochastic and partially unknown environments in which safety is essential. Specifically, we abstract the problem as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space. Standard learning approaches synthesize cost-optimal strategies without guaranteein...
متن کاملDynamic Control with Actor-Critic Reinforcement Learning
4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...
متن کاملSafety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation
This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback control...
متن کاملDistributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build ...
متن کامل1 Supervised Actor - Critic Reinforcement Learning
Editor’s Summary: Chapter ?? introduced policy gradients as a way to improve on stochastic search of the policy space when learning. This chapter presents supervised actor-critic reinforcement learning as another method for improving the effectiveness of learning. With this approach, a supervisor adds structure to a learning problem and supervised learning makes that structure part of an actor-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2022
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-022-06187-8